perm filename CH4A[206,JMC] blob sn#005356 filedate 1972-05-04 generic text, type T, neo UTF8
00100	COMPILING IN LISP
00200	
00300	
00400		Translating  programs  from  a  source language into a target
00500	language is  the  most  common  form  of  computation  with  symbolic
00600	expressions.     Because  of  its  commonness,  and  because  general
00700	computation  with  symbolic  expressions  is  not   much   practiced,
00800	compiling  is  generally treated separately from the rest of symbolic
00900	computation.  Nevertheless, it turns out that LISP is a  rather  good
01000	language  for  writing  compilers  and  it  is  especially  good  for
01100	describing compilers in an understandable way.  To that end we  shall
01200	describe   two  compilers,  LCOM0  and  LCOM4,  both  of  which  take
01300	S-expression descriptions of LISP recursive functions  and  translate
01400	them  into  machine  code for the PDP-10 computer.  LCOM0 is about as
01500	simple  a  LISP  compiler  as  can  be  written  (only  one  page  of
01600	M-expressions),  but it compiles rather bulky machine code.  LCOM4 is
01700	based on the same ideas as LCOM0 but recognizes  many  special  cases
01800	and produces much better code.
01900	
02000		Compilers  perform several operations in sequence though some
02100	of them are sometimes combined. These operations are
02200	
02300		1. Parsing.  This consists of reading  the  input  string  of
02400	symbols  and  making a structure in memory that represents the source
02500	program.  Sometimes the program is parsed one statement  at  a  time,
02600	and  further operations are done before the next statement is parsed.
02700	The memory structure sometimes consists of tables,  sometimes  Polish
02800	prefix  or  postfix notation and sometimes is list structure.  In our
02900	case, parsing is trivial, because the LISP  read  program  interprets
03000	the  parentheses, spaces and atomic symbols of the source program and
03100	creates a list structure which can be used by  the  compiler  without
03200	further  parsing.   A  compiler  for LISP M-expressions would require
03300	parsing before compilation, but we would prefer to do this by  giving
03400	a  suitable  syntax  to  a  general  symbolic expression read program
03500	rather than by writing a special parser.
03600	
03700		2.  Sometimes  the  source  code  is  improved  before  being
03800	translated.  We  don't  do that.  All optimization is imbedded in the
03900	translator itself.
04000	
04100		3. Next the source code  is  translated  into  machine  code.
04200	This  is  sometimes  symbolic  assembly  code, sometimes it is binary
04300	code, and sometimes it is relocatable binary code with fixups in  the
04400	case  of  one  pass  compilers.   In  our  case,  the  output is list
04500	structure in the LAP assembly language.  Usually, using assembly code
04600	as an intermediate form wastes time because character strings have to
04700	be generated by the compiler and subsequently read a character  at  a
04800	time  by  the  symbolic assembly program.  In our case, this need not
04900	occur, because within memory, the  LAP  code  is  a  list  structure.
05000	Pointers  to  atoms  are  generated,  by the compiler and read by LAP
05100	without the characters in the print names of  the  atoms  ever  being
05200	scanned.
05300	
05400		4.  Optimization  of  the  object  code is often done, but we
05500	don't do this, because most of the useful optimizations can  be  done
05600	in  the  code  generation  phase  and  the  bad  object code is never
05700	generated.
05800	
05900		5. Finally, the assembly code is translated  into  binary  by
06000	LAP.  In  LISP,  the  translation  is done directly into the space in
06100	memory where the code will eventually run.
06200	
06300		In our opinion, some of the techniques used here  are  better
06400	than  many  of the usual techniques even for compiling languages like
06500	Algol. In  particular,  we  can  recommend  representing  the  source
06600	program as a list structure and doing the optimization along with the
06700	compiling.
06800	
06900	
07000	4.2. An example of compilation.
07100	
07200		We  start by showing the LAP code into which LCOM4 translates
07300	our old friend %alt.  Remember that
07400	
07500		alt x ← if null x ∨ null cdr x then x else a x . alt dd x.
07600	
07700	The corresponding LAP code produced by LCOM4 is
07450	
07500	(LAP ALT SUBR)
07600	(PUSH P 1) 
07700	(MOVE 1 0 P) 
07800	(JUMPE 1 G0186) 
07900	(HRRZ@ 1 0 P) 
08000	(JUMPN 1 G0185) 
08100	G0186 
08200	(MOVE 1 0 P) 
08300	(JRST 0 G0184) 
08400	G0185 
08500	(HRRZ@ 1 0 P) 
08600	(HRRZ@ 1 1) 
08700	(CALL 1 (E ALT)) 
08800	(MOVE 2 1) 
08900	(HLRZ@ 1 0 P) 
09000	(CALL 2 (E CONS)) 
09100	G0184 
09200	(SUB P (C 0 0 1 1)) 
09300	(POPJ P) 
09400	NIL 
09500	
09600	We have the following remarks about this object program.
09700	
09800		1. (LAP ALT SUBR) tells LISP  that  this  is  a  LAP  program
09900	called  ALT  and  that it is of type SUBR.  LAP will assemble it into
10000	binary program space and put a pointer to it on the property list  of
10100	the atom ALT.  The NIL at the end is just punctuation.
10200	
10300		2. The arguments of the function (and there can't be too many
10400	of them) appear in successive accumulators  of  the  PDP-10  starting
10500	with  accumulator  1.   The  PDP-10 computer has 16 accumulators, but
10600	only the 1 thru 5 are used for this purpose.
10700	
10800		3. Within the e
10900